Spatial Regression with Partial Differential Equation Regularization

Simone Panzeri @ fdaPDE Team, MOX, Department of Mathematics, Politecnico di Milano, Italy

The fdaPDE C++/R library (Palummo et al., 2025) provides advanced statistical methods for physics-informed spatial and functional data analysis, featuring regularization terms based on partial differential equations; for a review of this class of methods, refer to Sangalli (2021). The library can handle different types of data that may evolve over time on non-standard spatial domains, including linear networks, irregular planar regions, curved surfaces, and volumes. Among the broad range of modeling capabilities offered by fdaPDE, we focus here on the spatial regression method.

Spatial regression method

Let \(\mathcal{D} \subset \mathbb{R}^d\,,\) with \(d \geq 1\,,\) be a bounded spatial domain of interest in which \(n\) fixed measurement stations are located. Here, we consider \(d = 2\,,\) but the proposed method can handle multidimensional spatial domains with complex shapes, including two-dimensional curved surfaces (Ettinger et al., 2016) and non-convex three-dimensional volumes (Arnone et al., 2023). At the location \(\mathbf{p}_i =(p_{1i}, p_{2i})^\top \in \mathcal{D}\) of each station, we observe a realization \(y_i \in \mathbb{R}\) of the response variable \(Y_i\) under study, along with a set of covariates \(\mathbf{x}_i = \left( x_{i,1},\ldots,x_{i,q} \right)^\top \in \mathbb{R}^q\,,\) if available. We present a semiparametric spatial regression model with partial differential equation regularization, which we formulate as follows: \[y_i = \mathbf{x}_i^\top \boldsymbol{\beta} + f(\mathbf{p}_i) + \varepsilon_i\,, \qquad i = 1, \ldots, n\,,\] where:

To estimate the unknown parameters, we refer to the basic form of the regularized methods, as reviewed in Sangalli (2021) and originally introduced in Sangalli et al. (2013). These works propose to minimize the following penalized functional based on the sum of squared errors: \[\frac{1}{n} \sum_{i = 1}^n \left( y_i - \mathbf{x}_i^\top \boldsymbol{\beta} - f(\mathbf{p}_i) \right)^2 + R(f; \lambda)\,,\] where \(R(f; \lambda)\) is the regularization term that depends on the smoothing parameter \(\lambda \in \mathbb{R}^+\,.\) The regularization term is defined using Partial Differential Equations (PDEs), with differential operators encoding the smoothness of \(f\) over \(\mathcal{D}\,.\) This term may also incorporate problem-specific information into the modeling framework, whenever available. The functional above embodies the trade-off between data fidelity and model fidelity through the least-squares term and the misfit with respect to the PDE, respectively. The balance between these two contributions is governed by the smoothing parameter.

The regularization term is given by: \[R(f; \lambda) = \lambda \int_\mathcal{D} \left( L f - u \right)^2 \, \text{d}\mathbf{p}\,,\] where \(Lf = u\) is the PDE modeling the spatial variation of the unknown spatial field \(f\,.\) The PDE encodes problem-specific information derived from the physics of the phenomenon under study and formalizes it in terms of \(f\,.\) In the most general case, the method currently supports second-order linear elliptic PDEs, with diffusion-transport-reaction operators of the following form: \[Lf = -\nabla \cdot (K \nabla f) + \mathbf{b} \cdot \nabla f + cf\,,\] where:

Each of these parameters, along with the forcing term \(u \in L^2(\mathcal{D})\,,\) can vary over space and contribute to modeling different forms of anisotropy and non-stationarity. Consistent with the application presented in this vignette, we will focus on homogeneous forcing terms \(u \equiv 0\,.\) For non-homogeneous forcing terms, refer to Azzimonti et al. (2014), azzimonti2015blood and Arnone et al. (2019). If prior knowledge about the phenomenon under study is not available, the regularization term leads to stationary and isotropic smoothing through the Laplace operator \(\Delta\,,\) corresponding to \(K = I\,,\) \(\mathbf{b} = \mathbf{0}\,,\) and \(c = 0\,.\) Moreover, problem-specific conditions on \(f\) may be imposed at the boundary \(\partial\mathcal{D}\) of the spatial domain. The method currently supports Dirichlet, Neumann, and Robin boundary conditions, which respectively set the value of \(f\) at \(\partial\mathcal{D}\,,\) its normal derivative at \(\partial\mathcal{D}\,,\) or a combination of both. In addition, boundary conditions may vary over \(\partial\mathcal{D}\,.\)

The method for Spatial Regression with Partial Differential Equation regularization presented in this vignette, hereafter referred to as SR-PDE, is implemented within the fdaPDE2 C++/R library (Palummo et al., 2025). We load the library in the working environment as follows:

# Import the fdaPDE library
library(fdaPDE2)           # v. 2.0 (2025)
rm(list = ls())

# Load additional libraries and helper functions for plotting
source("utils/graphics.R")

Application to oceanographic quantities around the Florida peninsula

As a benchmark application in environmental sciences, we consider oceanographic data in the Gulf of Mexico, around the Florida peninsula. Tomasetto et al. (2024) presents this application in the context of spatial regression with differential regularization, proposing a parameter cascading approach for the estimation of PDE parameters, building on a first investigation by Bernardi et al. (2018). The dataset contains some indicators of water quality measured at 30 moored buoys, serving as monitoring stations, on April 1, 2020. In particular, we consider the Sea Surface Temperature (SST) and Dissolved Oxygen (DO) at standard depth levels. The dataset is freely and publicly available by different data sources, including the National Data Buoy Center and the National Centers for Environmental Information of the National Oceanic and Atmospheric Administration. SST is vital for marine ecosystems, weather prediction, and climate monitoring, influencing biological activity, atmospheric circulation, and heat absorption from climate change. On the other hand, DO is a key indicator of aquatic ecosystem health, essential for marine life and water quality. Monitoring its spatial distribution helps detect and address environmental issues early.

Spatial domain

First, we load the geometry of the spatial domain under study, created by preprocessing the Florida administrative boundary. This stage includes defining the specific region of interest located in the Gulf of Mexico, removing minor islands, and reducing the resolution of the coastlines. We imported the Florida administrative boundary using the tigris (Walker, 2025) R package, and performed subsequent preprocessing steps relying on the sf (Pebesma, 2025) and rmapshaper (Teucher et al., 2023) R packages.

The preprocessed geometry of the spatial domain is available from the file data/SRPDE_2D/SRPDE_2D_domain.shx and can be loaded through sf as follows.

## [SPATIAL DOMAIN]
domain <- st_read(dsn = "data/SRPDE_2D/domain/SRPDE_2D_domain.shx")
#> Reading layer `SRPDE_2D_domain' from data source 
#>   `/home/user/data/vignette/data/SRPDE_2D/domain/SRPDE_2D_domain.shx' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -85.275 ymin: 23.305 xmax: -78.975 ymax: 30.655
#> Geodetic CRS:  WGS 84
domain
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -85.275 ymin: 23.305 xmax: -78.975 ymax: 30.655
#> Geodetic CRS:  WGS 84
#>             NAME                       geometry
#> 1 Gulf of Mexico POLYGON ((-78.975 30.655, -...

The spatial domain can be interactively visualized using the mapview (Appelhans et al., 2023) R package.

# Interactive plot
mapview(domain,
        col.regions = "gray25", alpha.regions = 0.25, col = "black", lwd = 1.5,
        legend = FALSE, layer.name = "domain")

Figure 1: Spatial domain of interest – a portion of the Gulf of Mexico around the Florida peninsula.

Now we build a regular mesh of the spatial domain under consideration. We create a mesh based on boundary nodes and segments, and then we refine it by setting the maximum element area to 0.01 and the minimum angle to 20 degrees. To obtain the boundary nodes, we rely on the sf functionality, while the triangulation is constructed using the RTriangle (Shewchuk & Sterratt, 2025) R package. Other software can be used to get the triangulation; for example, it can be constructed through the fmesher (Lindgren, 2025) R package.

## [MESH]
# Define boundary nodes
boundary_nodes <- st_cast(x = domain, "POINT", crs = 4326)
boundary_nodes <- st_coordinates(x = boundary_nodes)
boundary_nodes <- data.frame(lon = boundary_nodes[,1], lat = boundary_nodes[,2])

# Remove the last node (duplicate node)
boundary_nodes <- boundary_nodes[-nrow(boundary_nodes),]
head(boundary_nodes)
#>         lon      lat
#> 1 -78.97500 30.65500
#> 2 -78.97500 23.30500
#> 3 -85.27500 23.30500
#> 4 -85.27500 29.69238
#> 5 -85.12147 29.71585
#> 6 -84.88178 29.73388

# Define boundary segments
boundary_segments <- cbind(1:nrow(boundary_nodes), c(2:nrow(boundary_nodes), 1))
head(boundary_segments)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    2    3
#> [3,]    3    4
#> [4,]    4    5
#> [5,]    5    6
#> [6,]    6    7

# Create a planar straight line graph object
boundary_pslg <- pslg(P = boundary_nodes, S = boundary_segments)

# Create a regular mesh of the spatial domain
mesh_regular <- triangulate(p = boundary_pslg)
if (is.null(mesh_regular$H)) mesh_regular$H <- matrix(data = numeric(0), ncol = 2)

# Refine the mesh of the spatial domain
mesh_regular <- triangulate(p = mesh_regular, a = 0.01, q = 20, D = TRUE)
if (is.null(mesh_regular$H)) mesh_regular$H <- matrix(data = numeric(0), ncol = 2)

# Number of nodes
nrow(mesh_regular$P)
#> [1] 2851

# Number of triangles
nrow(mesh_regular$T)
#> [1] 5342

We convert the triangulation object from RTriangle so that it can be read by fdaPDE.

# Set up the triangulation for fdaPDE
mesh <- triangulation(nodes = mesh_regular$P, cells = mesh_regular$T,
                      boundary = mesh_regular$PB)

# Nodes coordinates
head(mesh$nodes)
#>           [,1]     [,2]
#> [1,] -78.97500 30.65500
#> [2,] -78.97500 23.30500
#> [3,] -85.27500 23.30500
#> [4,] -85.27500 29.69238
#> [5,] -85.12147 29.71585
#> [6,] -84.88178 29.73388

# Number of nodes
mesh$n_nodes
#> [1] 2851

# Edges
head(mesh$nodes)
#>           [,1]     [,2]
#> [1,] -78.97500 30.65500
#> [2,] -78.97500 23.30500
#> [3,] -85.27500 23.30500
#> [4,] -85.27500 29.69238
#> [5,] -85.12147 29.71585
#> [6,] -84.88178 29.73388

# Number of edges
mesh$n_edges
#> [1] 8192

# Triangles
head(mesh$cells)
#>      [,1] [,2] [,3]
#> [1,]   37  284  365
#> [2,] 1044  731 1047
#> [3,]  172  135  170
#> [4,]  162  240  242
#> [5,]   42   41   43
#> [6,]  155  177  154

# Number of triangles
mesh$n_cells
#> [1] 5342

# Bounding box
mesh$bbox
#>         [,1]   [,2]
#> [1,] -85.275 23.305
#> [2,] -78.975 30.655

We visualize the resulting mesh of the spatial domain of interest.

# Interactive plot
mapview(domain,
        col.regions = "gray25", alpha.regions = 0.25, col = "black", lwd = 1.5,
        legend = FALSE, layer = "domain") +
  mapview(mesh, crs = 4326,
          col.regions = "transparent", lwd = 1.25, legend = FALSE, layer = "mesh")

Figure 2: Regular mesh of the spatial domain of interest around the Florida peninsula with \(2\,851\) nodes and \(5\,342\) triangles.

We use the triangulation just created to define the spatial support of a geoframe object. This object will host layers containing data observed over the spatial support.

# Create the geoframe
florida <- geoframe(domain = mesh)
florida
#> Geoframe with 0 layers
#> Bounding box:    xmin: -85.275 ymin: 23.305 xmax: -78.975 ymax: 30.655
#> Number of nodes: 2851
#> Number of cells: 5342

Data

We import the preprocessed data from the file data/SRPDE_2D/SRPDE_2D_data.txt as a data.frame object. For the purpose of visualization, we convert this object into a sf object. Moreover, we add a data layer to the geoframe object defined above, thus obtaining a format compatible with the implementation of the proposed regression method.

## [DATA]
# Load the data
data <- read.table(file = "data/SRPDE_2D/SRPDE_2D_data.txt")
head(data)
#>      lon   lat    SST       DO   Nitrate  Phosphate Salinity Silicate
#> 1 -81.11 24.71 26.535 207.8540 0.1250918 0.09319023 36.08240 1.476339
#> 2 -80.40 25.00 25.661 208.5866 0.1152773 0.10779064 36.10134 1.840899
#> 3 -81.90 23.90 27.383 205.3182 0.1258188 0.11694750 36.12820 1.745409
#> 4 -83.60 24.60 25.540 208.7237 0.2219844 0.06881616 36.30968 1.074786
#> 5 -80.90 24.80 26.430 208.5866 0.1152773 0.10779064 36.11765 1.840899
#> 6 -81.10 24.30 27.037 207.8540 0.1250918 0.09319023 36.08949 1.476339

# Convert data into a sf object
data_sf <- st_as_sf(x = data, coords = c("lon", "lat"), crs = 4326)

# Add layer with data to the geoframe object
florida$insert(layer = "ocean", type = "point",
          geo = c("lon", "lat"), data = data)
florida
#> Geoframe with 1 layers
#> Bounding box:    xmin: -85.275 ymin: 23.305 xmax: -78.975 ymax: 30.655
#> Number of nodes: 2851
#> Number of cells: 5342
#> 
#> Layer: ocean
#> Type:  POINT
#> Dims: 30, 6
#> First 6 data rows:
#> SST     DO      Nitrate  Phosphate Salinity Silicate 
#> <flt64> <flt64> <flt64>  <flt64>   <flt64>  <flt64>  
#> 26.535  207.854 0.125092 0.0931902 36.0824  1.47634  
#> 25.661  208.587 0.115277 0.107791  36.1013  1.8409   
#> 27.383  205.318 0.125819 0.116948  36.1282  1.74541  
#> 25.54   208.724 0.221984 0.0688162 36.3097  1.07479  
#> 26.43   208.587 0.115277 0.107791  36.1176  1.8409   
#> 27.037  207.854 0.125092 0.0931902 36.0895  1.47634

The recorded values are shown in the interactive plot below, created using the mapview (Appelhans et al., 2023) R package.

# Interactive plot
mapview(florida[["ocean"]], crs = 4326,
        color_palettes = list("inferno", "viridis", "viridis", "viridis",
                              "viridis", "viridis"))

Figure 3: Sea Surface Temperature (SST), Dissolved Oxygen (DO), and concentrations of Nitrate, Phosphate, Salinity, and Silicate, observed at 30 monitoring stations located in the Gulf of Mexico, around the Florida peninsula, on April 1, 2020.

First, we focus on the SST. The non-standard shape of the spatial domain strongly influences the oceanographic quantity under study. For example, temperatures recorded at two buoys located on opposite sides of the Florida peninsula are naturally less correlated than those recorded at two buoys situated the same distance apart but on the same side of the peninsula. SR-PDE naturally accounts for the geometry of the domain, which features irregular boundaries and concavities.

SST satellite images

In addition to the data introduced above, the file data/SRPDE_2D/raster/SST.RData contains a raster (Hijmans, 2025) object with the SST values observed from high resolution satellite images provided by the Jet Propulsion Laboratory of the National Aeronautics and Space Administration (NASA).

# Load the SST data from NASA satellite images as a raster object
load("data/SRPDE_2D/raster/SST.RData")
SST
#> class      : RasterLayer 
#> dimensions : 735, 628, 461580  (nrow, ncol, ncell)
#> resolution : 0.01, 0.01  (x, y)
#> extent     : -85.275, -78.995, 23.305, 30.655  (xmin, xmax, ymin, ymax)
#> crs        : +proj=longlat +datum=WGS84 +no_defs 
#> source     : memory
#> names      : analysed.sea.surface.temperature 
#> values     : 293.588, 301.339  (min, max)
#> time       : 2020-04-01 09:00:00

The SST data from NASA satellite images appear as follows:

# Interactive plot
mapview(SST - 273.15, col.regions = inferno, na.color = "transparent",
        layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

Figure 4: SST data from NASA satellite images, used solely for comparison with the SST estimates computed by the proposed SR-PDE method in the next section.

We point out that these data are not used for the purpose of estimation; they are displayed here only to enable comparison with respect to the SST estimate provided by the proposed SR-PDE method in the next section. Instead, the estimation is performed using the 30 observations measured at the moored buoys, contained in the data variable loaded above.

SR-PDE model fitting for isotropic smoothing (without covariates)

Now we are ready to perform the SR-PDE spatial regression method using the sr function. This function offers several options for solving the regression problem, including the possibility to specify covariates, PDE parameters, and boundary conditions, whenever available. The function supports several distributions within the exponential family for generalized linear models and can handle areal data. In addition, the function implements different algorithms for degrees of freedom computation and criteria for optimal smoothing parameter selection. For further information, see the documentation in the Help by typing ?sr.

Here, in its simplest form, we apply isotropic smoothing to the SST data, hereafter referred to as SR-PDE\((I,\mathbf{0},0)\,.\) This means that we consider:

Isotropic smoothing with fixed smoothing parameter

We can fit the model with a fixed proposed value for the smoothing parameter \(\lambda\,.\) For instance, we can set it to \(0.00001\,,\) and compute the estimate as follows:

# Set up the finite element function
f_SST_iso_fixed <- fe_function(mesh, type = "P1")

# Proposed value for the smoothing parameter
lambda_fixed <- 0.00001

# Isotropic smoothing model
model_SST_iso_fixed <- sr(formula = SST ~ f_SST_iso_fixed, data = florida)

# Isotropic smoothing fit with fixed lambda
fit_SST_iso_fixed <- model_SST_iso_fixed$fit(lambda = lambda_fixed)

We print the regression model outputs.

# Fitted values at mesh nodes
head(f_SST_iso_fixed$coeff)
#> [1] 24.87297 25.09173 29.04511 24.88935 24.88134 24.84066

# Fitted values at locations
head(model_SST_iso_fixed$fitted)
#> [1] 26.56576 25.65666 27.37757 25.55524 26.42243 27.01873

# Residuals at locations: response - fitted values
SST_iso_fixed_residuals <- c(florida[["ocean"]]$SST - model_SST_iso_fixed$fitted)
summary(SST_iso_fixed_residuals)
#>       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#> -0.0685988 -0.0077553  0.0009632  0.0000000  0.0071528  0.0509515

To evaluate the regression model fit over a fine grid and enable interactive visualization of the estimate, we internally create a new raster object. We then compute the fitted values over the grid using the $eval method. We plot the resulting estimate with mapview: this is just one possibility; various other plotting options are available depending on the specific purpose.

The regression model fit over the region of interest is instead displayed below (left). For qualitative comparison, the SST data from NASA satellite images is also displayed (right).

# Interactive plot
map1 <- mapview(f_SST_iso_fixed, crs = 4326, col.regions = inferno,
                na.color = "transparent", layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map2 <- mapview(SST - 273.15, col.regions = inferno, na.color = "transparent",
                layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

sync(map1, map2)

Figure 5: SST estimate provided by the isotropic SR-PDE\((I,\mathbf{0},0)\) (without covariates) with \(\lambda = 0.001\) (left) and SST data from NASA satellite images (right). We recall that the latter are not used for estimation purposes; they are displayed here solely to allow comparison with the SST estimate computed by the proposed SR-PDE method based on the 30 observations measured at the moored buoys.

The smoothing fit is not accurate compared to the satellite image data in most regions, resulting in overly smoothed behavior. This is likely due to the value selected of the smoothing parameter \(\lambda\,,\) which is clearly not optimal here.

SR-PDE model fitting for physics-informed smoothing (without covariates)

The phenomenon under study is heavily influenced by the Gulf Stream, a warm, swift, and non-stationary current in the Atlantic Ocean. The proposed SR-PDE method is well-suited for this type of data analysis, as it allows for the incorporation of physical knowledge into the modeling framework. In fact, the influence of the Gulf Stream on the spatial distribution of the considered oceanographic quantity can be properly captured by a diffusion-transport PDE. Thus, we apply physics-informed smoothing to the SST data, hereafter referred to as SR-PDE\((K,\mathbf{b},0)\,.\) This means that we consider:

We specify that the reaction coefficient can be set to zero, as the shrinkage effect is not relevant to the phenomenon under study. Moreover, the relatively small value of the diffusion intensity \(\eta\,,\) compared to the values assumed by \(\mathbf{b}(\mathbf{p})\) over the domain of interest, highlights that the transport term has a significant impact on the phenomenon under study, as it dominates the diffusion effect.

PDE parameters

We now define PDE parameters. To this aim, the evaluation of the transport term at any arbitrary location \((x,y)\) is performed by extracting values from the files data/SRPDE_2D/raster/Transport_x.RData and data/SRPDE_2D/raster/Transport_y.RData using the raster (Hijmans, 2025) R package. These data – specifically, the direction and intensity of the Gulf Stream around the Florida peninsula – are provided by the Ocean Surface Current Analysis Real-time (OSCAR) dataset, with a 5-day resolution. The following get_transport_x and get_transport_y functions extract the horizontal and vertical components at \((x,y)\,,\) and the get_transport_coeff function extracts the magnitude of the transport term at \((x,y)\,.\)

# Load the horizontal component of the transport term as a raster object
load("data/SRPDE_2D/raster/Transport_x.RData")
Transport_x
#> class      : RasterLayer 
#> dimensions : 42, 60, 2520  (nrow, ncol, ncell)
#> resolution : 0.3333333, 0.3333333  (x, y)
#> extent     : -98.83333, -78.83333, 17.5, 31.5  (xmin, xmax, ymin, ymax)
#> crs        : +proj=longlat +datum=WGS84 +no_defs 
#> source     : memory
#> names      : layer 
#> values     : -0.721745, 1.086167  (min, max)

# Load the vertical component of the transport term as a raster object
load("data/SRPDE_2D/raster/Transport_y.RData")
Transport_y
#> class      : RasterLayer 
#> dimensions : 42, 60, 2520  (nrow, ncol, ncell)
#> resolution : 0.3333333, 0.3333333  (x, y)
#> extent     : -98.83333, -78.83333, 17.5, 31.5  (xmin, xmax, ymin, ymax)
#> crs        : +proj=longlat +datum=WGS84 +no_defs 
#> source     : memory
#> names      : layer 
#> values     : -0.982114, 1.332888  (min, max)

# Get horizontal component of the transport term at (x,y)
get_transport_x <- function(x, y){
  value = extract(Transport_x, cbind(x, y))
  return(value)
}

# Get vertical component of the transport term at (x,y)
get_transport_y <- function(x, y){
  value = extract(Transport_y, cbind(x, y))
  return(value)
}

# Get the magnitude of the transport term at (x,y)
get_transport_coeff <- function(x, y){
  value = (get_transport_x(x = x, y = y))^2 + (get_transport_y(x = x, y = y))^2
  return(sqrt(value))
}

The transport term can be visualized as follows:

# Create a data.frame with transport term values
df <- as.data.frame(coordinates(SST))
names(df) <- c("lon", "lat")
df$Transport_x <- ifelse(is.na(values(SST)), NA,
                         get_transport_x(x = df$lon, y = df$lat))
df$Transport_y <- ifelse(is.na(values(SST)), NA,
                         get_transport_y(x = df$lon, y = df$lat))
df$Transport_coeff <- ifelse(is.na(values(SST)), NA,
                             get_transport_coeff(x = df$lon, y = df$lat))

# Thin out the number of vectors
df <- df %>% filter(row_number() %% 1000 == 0)

## [STADIA MAPS API KEY]
# Insert here your API key; for link and instruction: ??register_stadiamaps
# register_stadiamaps(key = "---your API key---", write = TRUE)

# Compute the bounding box
bbox <- extent(SST)

# Static plot
ggmap(get_stadiamap(bbox = c(left = bbox@xmin, bottom = bbox@ymin,
                             right = bbox@xmax, top = bbox@ymax),
                    zoom = 7,
                    maptype = "alidade_smooth",
                    color = "bw")) +
  geom_segment(aes(xend = lon + Transport_x, yend = lat + Transport_y,
                   colour = Transport_coeff),
               data = df,
               arrow = arrow(angle = 23, length = unit(0.13, "inches")),
               size = 1.1,
               alpha = 1) +
  scale_color_gradientn(colours = rocket(n = 100, end = 0.87)) +
  theme(legend.position = "right",
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 8),
        text = element_text(size = 14)) +
  labs(color = "Velocity [m/s]") +
  theme(text = element_text(size = 14)) +
  guides(colour = guide_colourbar(barheight = unit(7, "cm"),
                                  barwidth = unit(0.55, "cm"),
                                  raster = TRUE, ticks = FALSE))

Figure 8: Gulf Stream velocity around the Florida peninsula.

Thus, we can define:

# Diffusion tensor
K <- 0.0218 * diag(2)

# Space-varying transport vector
b <- function(points){
  output = array(data = 0, c(nrow(points),2))
  for (i in 1:nrow(points)){
    output[i,1] = get_transport_x(x = points[i,1], y = points[i,2])
    output[i,2] = get_transport_y(x = points[i,1], y = points[i,2])
  }
  return(output)
}

We do not specify the reaction coefficient or the forcing term above, meaning that both are set to 0 by default when setting up the sr model.

Physics-informed smoothing with optimization of the smoothing parameter via Generalized Cross-Validation (GCV) minimization using Newton’s method

We perform the physics-informed SR-PDE\((K,\mathbf{b},0)\) method. Here, the optimal smoothing parameter is selected via GCV minimization using Newton’s method. This technique needs the exact computation of the degrees of freedom.

# Proposed values for the smoothing parameter
lambda_grid <- 10^seq(from = -6, to = -2, by = 0.2)

# Set up the finite element function
f_SST_physics <- fe_function(mesh, type = "P1")

# Physics-informed smoothing model
model_SST_physics <- sr(formula = SST ~ f_SST_physics, data = florida,
                        penalty = fe_elliptic(K = K, b = b))

# Physics-informed smoothing fit with Newton's method for GCV minimization
fit_SST_physics <- model_SST_physics$fit(calibrator = 
  gcv(optimizer = grid_search(grid = lambda_grid))
)

We print the regression model outputs.

# Fitted values at mesh nodes
head(f_SST_physics$coeff)
#> [1] 25.76334 24.35116 27.50629 23.95429 23.94322 23.88667

# Fitted values at locations
head(model_SST_physics$fitted)
#> [1] 26.50095 25.69924 27.34873 25.58107 26.41394 26.93354

# Residuals at locations: response - fitted values
SST_physics_residuals <- c(florida[["ocean"]]$SST - model_SST_physics$fitted)
summary(SST_physics_residuals)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> -0.153837 -0.015319  0.001909  0.000000  0.015671  0.103457

# Optimal value selected for the smoothing parameter
lambda_opt_physics <- fit_SST_physics$optimum
lambda_opt_physics
#> [1] 0.01

The regression model fit over the region of interest is instead displayed below (bottom right). For qualitative comparison, the SST data from NASA satellite images is also displayed (top right).

# Interactive plot
map1 <- mapview(florida[["ocean"]], varnames = "SST", crs = 4326, layer.name = "SST [°C]",
                color_palettes = list("inferno"), na.color = "transparent") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map2 <- mapview(SST - 273.15, col.regions = inferno, na.color = "transparent",
                layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map3 <- mapview(f_SST_iso_grid, crs = 4326, col.regions = inferno,
                na.color = "transparent", layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map4 <- mapview(f_SST_physics, crs = 4326, col.regions = inferno,
                na.color = "transparent", layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

sync(map1, map2, map3, map4, ncol = 2)

Figure 9: SST data observed at 30 monitoring stations located in the Gulf of Mexico (top left); SST data from NASA satellite images (top right); SST estimate provided by the isotropic SR-PDE\((I,\mathbf{0},0)\) with \(\lambda\) selected via GCV minimization using Netwon’s method (bottom left); SST estimate provided by the physics-informed SR-PDE\((K,\mathbf{b},0)\) with \(\lambda\) selected via GCV minimization using Netwon’s method (bottom right). We recall that the latter are not used for estimation purposes; they are displayed here solely to allow comparison with the SST estimate computed by the proposed SR-PDE method based on the 30 observations measured at the moored buoys.

This model fitting proves to capture the strongly anisotropic and non-stationary pattern of the SST. By leveraging problem-specific information – specifically, the presence of the Gulf Stream – the proposed SR-PDE\((K,\mathbf{b},0)\) method accurately estimates the spatial distribution of the quantity under study using only the buoy measurements.

This insight is further supported by the boxplot of the residuals from the two methods, as shown in the plot below. For the plot of the root mean square error in cross-validation, refer to Tomasetto et al. (2024).

# Boxplot
par(family = "serif")
boxplot(SST_iso_grid_residuals, SST_physics_residuals,
        col = rocket(2), ylab = "residuals")
axis(1, at = 1:2, line = 0.5, tick = FALSE,
     labels = c("ISOTROPIC SR-PDE\n (GRID)", "PHYSICS-INFORMED SR-PDE\n (NEWTON)"))

Figure 10: Boxplot of the residuals from the isotropic SR-PDE\((I,\mathbf{0},0)\), with \(\lambda\) selected via GCV minimization using grid search (first) and Newton’s method (second), and from the physics-informed SR-PDE\((K,\mathbf{b},0)\) (third).

SR-PDE model fitting for physics-informed smoothing (with covariates)

The SR-PDE model can incorporate covariates associated with the corresponding observed data values. We present an example based on the Dissolved Oxygen (DO) data, measured in \(\mu\,\text{mol} / \text{kg}\), and modeled using the Sea Surface Temperature (SST) as a covariate. The recorded values for SST and DO are shown in the interactive plot below.

# Interactive plot
map1 <- mapview(florida[["ocean"]], varnames = "SST", crs = 4326,
                color_palettes = list("inferno"), na.color = "transparent",
                layer.name = "SST [°C]") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map2 <- mapview(florida[["ocean"]], varnames = "DO", crs = 4326,
                color_palettes = list("viridis"), na.color = "transparent",
                layer.name = "DO") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

sync(map1, map2)

Figure 11: SST (left) and DO (right) observed at 30 monitoring stations located in the Gulf of Mexico, around the Florida peninsula, on April 1, 2020.

DO is strongly negatively correlated with SST, with a Pearson correlation coefficient of approximately −0.7125017.

cor(x = florida[["ocean"]]$SST, y = florida[["ocean"]]$DO, method = "pearson")
#> [1] -0.7125017

As water temperature rises, molecular motion increases, reducing the solubility of oxygen and leading to lower dissolved oxygen levels. Thus, we apply physics-informed smoothing to the DO data, including the SST data as a covariate into the modeling framework. This means that we consider:

We modify the diffusion intensity value accordingly, while keeping all other PDE parameters unchanged from those used in the SR-PDE\((K,\mathbf{b},0)\) method applied to the SST data.

# Diffusion tensor
K <- 1.1341 * diag(2)

We perform the anisotropic SR-PDE\((K,\mathbf{b},0)\) method. Here, the optimal smoothing parameter is selected via GCV minimization using Grid Search method. This technique consier the stochastic computation of the degrees of freedom.

# Set up the finite element function
f_DO_physics <- fe_function(mesh, type = "P1")

# Physics-informed smoothing model
model_DO_physics <- sr(formula = DO ~ SST + f_DO_physics, data = florida,
                        penalty = fe_elliptic(K = K, b = b))

# Physics-informed smoothing fit Grid Search method for GCV minimization
fit_DO_physics <- model_DO_physics$fit(
  calibrator = gcv(optimizer = grid_search(lambda_grid))
)

We print the regression model outputs.

# Fitted values at mesh nodes
head(f_DO_physics$coeff)
#> [1] 221.2550 204.9870 205.0861 227.4906 227.6271 228.3258

# Fitted values at locations
head(model_DO_physics$fitted)
#> [1] 207.8663 208.5899 205.3264 208.7254 208.5862 207.8473

# Residuals at locations: response - fitted values
DO_physics_residuals <- c(florida[["ocean"]]$DO - model_DO_physics$fitted)
summary(DO_physics_residuals)
#>       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
#> -0.0159428 -0.0032196  0.0002942  0.0000000  0.0035508  0.0139123

# Optimal value selected for the smoothing parameter
lambda_opt_physics <- fit_DO_physics$optimum
lambda_opt_physics
#> [1] 1e-06

# Estimate of parameter beta
model_DO_physics$beta
#> [1] 0.04724586

The regression model fit over the region of interest is instead displayed below.

# Interactive plot
map1 <- mapview(f_DO_physics, crs = 4326, col.regions = viridis,
                na.color = "transparent", layer.name = "DO") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

map2 <- mapview(florida[["ocean"]], varnames = "DO", crs = 4326,
                color_palettes = list("viridis"), na.color = "transparent",
                layer.name = "DO") +
  mapview(domain, col.regions = "transparent", alpha.regions = 0,
          col = "black", lwd = 1.5, layer.name = "domain", legend = FALSE)

sync(map1, map2)

Figure 12: SST estimate provided by the physics-informed SR-PDE\((K,\mathbf{b},0)\) with \(\lambda\) selected via GCV minimization using Netwon’s method and using SST as a covariate (left); DO data observed at 30 monitoring stations located in the Gulf of Mexico (right).

This model fitting successfully proves to capture the strongly anisotropic and non-stationary pattern exhibited by the DO data, leveraging both the physics of the underlying phenomenon and information from the SST data.

References

Appelhans, T., Detsch, F., Reudenbach, C., & Woellauer, S. (2023). Mapview: Interactive viewing of spatial data in R. url: https://CRAN.R-project.org/package=mapview.
Arnone, E., Azzimonti, L., Nobile, F., & Sangalli, L. M. (2019). Modeling spatially dependent functional data via regression with differential regularization. Journal of Multivariate Analysis, 170, 275–295. https://doi.org/10.1016/j.jmva.2018.09.006
Arnone, E., Negri, L., Panzica, F., & Sangalli, L. M. (2023). Analyzing data in complicated 3D domains: Smoothing, semiparametric regression, and functional principal component analysis. Biometrics, 79(4), 3510–3521. https://doi.org/10.1111/biom.13845
Azzimonti, L., Nobile, F., Sangalli, L. M., & Secchi, P. (2014). Mixed finite elements for spatial regression with PDE penalization. SIAM/ASA Journal on Uncertainty Quantification, 2(1), 305–335. https://doi.org/10.1137/130925426
Bernardi, M. S., Carey, M., Ramsay, J. O., & Sangalli, L. M. (2018). Modeling spatial anisotropy via regression with partial differential regularization. Journal of Multivariate Analysis, 167, 15–30. https://doi.org/10.1016/j.jmva.2018.03.014
Ettinger, B., Perotto, S., & Sangalli, L. M. (2016). Spatial regression models over two-dimensional manifolds. Biometrika, 103(1), 71–88.
Hijmans, R. J. (2025). Raster: Geographic data analysis and modeling. url: https://CRAN.R-project.org/package=raster.
Lindgren, F. (2025). Fmesher: Triangle meshes and related geometry tools. url: https://CRAN.R-project.org/package=fmesher.
Palummo, A., Clemente, A., Sangalli, E., Arnone, Formaggia, L., & Maria, L. (2025). fdaPDE: Physics-informed statistical learning. url: https://CRAN.R-project.org/package=fdaPDE.
Pebesma, E. (2025). Sf: Simple Features for R. url: https://CRAN.R-project.org/package=sf.
Sangalli, L. M. (2021). Spatial regression with partial differential equation regularisation. International Statistical Review, 89(3), 505–531. https://doi.org/10.1111/insr.12444
Sangalli, L. M., Ramsay, J. O., & Ramsay, T. O. (2013). Spatial spline regression models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(4), 681–703. https://doi.org/10.1111/rssb.12009
Shewchuk, J. R., & Sterratt, D. C. (2025). RTriangle - a 2D quality mesh generator and Delaunay triangulator. url: https://CRAN.R-project.org/package=RTriangle.
Teucher, A., Russell, K., & Bloch, M. (2023). Rmapshaper: Client for ’mapshaper’ for ’geospatial’ operations. url: https://CRAN.R-project.org/package=rmapshaper.
Tomasetto, M., Arnone, E., & Sangalli, L. M. (2024). Modeling anisotropy and non-stationarity through physics-informed spatial regression. Environmetrics, 35(8). https://doi.org/10.1002/env.2889
Walker, K. (2025). Tigris: Load census TIGER/line shapefiles. url: https://CRAN.R-project.org/package=tigris.